Goto

Collaborating Authors

 batch normalisation



Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning

Chen, Hongyao, Xu, Tianyang, Wu, Xiaojun, Kittler, Josef

arXiv.org Artificial Intelligence

Batch Normalisation (BN) is widely used in conventional deep neural network training to harmonise the input-output distributions for each batch of data. However, federated learning, a distributed learning paradigm, faces the challenge of dealing with non-independent and identically distributed data among the client nodes. Due to the lack of a coherent methodology for updating BN statistical parameters, standard BN degrades the federated learning performance. To this end, it is urgent to explore an alternative normalisation solution for federated learning. In this work, we resolve the dilemma of the BN layer in federated learning by developing a customised normalisation approach, Hybrid Batch Normalisation (HBN). HBN separates the update of statistical parameters (i.e. , means and variances used for evaluation) from that of learnable parameters (i.e. , parameters that require gradient updates), obtaining unbiased estimates of global statistical parameters in distributed scenarios. In contrast with the existing solutions, we emphasise the supportive power of global statistics for federated learning. The HBN layer introduces a learnable hybrid distribution factor, allowing each computing node to adaptively mix the statistical parameters of the current batch with the global statistics. Our HBN can serve as a powerful plugin to advance federated learning performance. It reflects promising merits across a wide range of federated learning settings, especially for small batch sizes and heterogeneous data.


Reviews: Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

The suggested reparametrisation and its theoretical analysis are very interesting and I enjoyed reading the paper. However, some points in the theoretical analysis could be improved: The paper argues that the new parametrisation improves the conditioning matrix of the gradient, but neither a strong theoretical argument nor a empirical demonstration for this are given. In line 127 it is said "Empirically, we find that w is often (close to) a dominant eigenvector of the covariance matrix C", but the correspond experiments are neither shown in the paper nor in the supplemental material. In line 122/123 the authors claim "It has been observed that neural networks with batch normalization also have this property (to be relatively insensitive to different learning rates), which can be explained by this analysis.". However, it did not became clear to me, how the analysis of the previous sections can be directly transferred to batch normalisation.


Evaluation of Data Augmentation and Loss Functions in Semantic Image Segmentation for Drilling Tool Wear Detection

Schlager, Elke, Windisch, Andreas, Hanna, Lukas, Klünsner, Thomas, Hagendorfer, Elias Jan, Teppernegg, Tamara

arXiv.org Artificial Intelligence

Tool wear monitoring is crucial for quality control and cost reduction in manufacturing processes, of which drilling applications are one example. In this paper, we present a U-Net based semantic image segmentation pipeline, deployed on microscopy images of cutting inserts, for the purpose of wear detection. The wear area is differentiated in two different types, resulting in a multiclass classification problem. Joining the two wear types in one general wear class, on the other hand, allows the problem to be formulated as a binary classification task. Apart from the comparison of the binary and multiclass problem, also different loss functions, i. e., Cross Entropy, Focal Cross Entropy, and a loss based on the Intersection over Union (IoU), are investigated. Furthermore, models are trained on image tiles of different sizes, and augmentation techniques of varying intensities are deployed. We find, that the best performing models are binary models, trained on data with moderate augmentation and an IoU-based loss function.


Feedback-Gated Rectified Linear Units

Kemmerling, Marco

arXiv.org Artificial Intelligence

Feedback connections play a prominent role in the human brain but have not received much attention in artificial neural network research. Here, a biologically inspired feedback mechanism which gates rectified linear units is proposed. On the MNIST dataset, autoencoders with feedback show faster convergence, better performance, and more robustness to noise compared to their counterparts without feedback. Some benefits, although less pronounced and less consistent, can be observed when networks with feedback are applied on the CIFAR-10 dataset.


Entangled q-Convolutional Neural Nets

Anagiannis, Vassilis, Cheng, Miranda C. N.

arXiv.org Machine Learning

We introduce a machine learning model, the q-CNN model, sharing key features with convolutional neural networks and admitting a tensor network description. As examples, we apply q-CNN to the MNIST and Fashion MNIST classification tasks. We explain how the network associates a quantum state to each classification label, and study the entanglement structure of these network states. In both our experiments on the MNIST and Fashion-MNIST datasets, we observe a distinct increase in both the left/right as well as the up/down bipartition entanglement entropy during training as the network learns the fine features of the data. More generally, we observe a universal negative correlation between the value of the entanglement entropy and the value of the cost function, suggesting that the network needs to learn the entanglement structure in order the perform the task accurately. This supports the possibility of exploiting the entanglement structure as a guide to design the machine learning algorithm suitable for given tasks.


On Batch Normalisation for Approximate Bayesian Inference

Mukhoti, Jishnu, Dokania, Puneet K., Torr, Philip H. S., Gal, Yarin

arXiv.org Machine Learning

We study batch normalisation in the context of variational inference methods in Bayesian neural networks, such as mean-field or MC Dropout. We show that batch-normalisation does not affect the optimum of the evidence lower bound (ELBO). Furthermore, we study the Monte Carlo Batch Normalisation (MCBN) algorithm, proposed as an approximate inference technique parallel to MC Dropout, and show that for larger batch sizes, MCBN fails to capture epistemic uncertainty. Finally, we provide insights into what is required to fix this failure, namely having to view the mini-batch size as a variational parameter in MCBN. We comment on the asymptotics of the ELBO with respect to this variational parameter, showing that as dataset size increases towards infinity, the batch-size must increase towards infinity as well for MCBN to be a valid approximate inference technique.


Batch Norm Patent Granted To Google: Is AI Ownership The Gold Rush Of 21st Century?

#artificialintelligence

The machine learning community has witnessed a surge in releases of frameworks, libraries and software. Tech pioneers like Google, Amazon, Microsoft and others have insisted their intention behind open-sourcing their technology. However, there has been a growing trend of these tech giants claiming ownership for their innovations. According to the National Bureau of Economic Research study, in 2010, there were 145 US patent filings that mentioned machine learning, compared to 594 in 2016. Google, especially, has filed patents related to machine learning and neural networks 99 times in 2016 alone.


ChronoMID - Cross-Modal Neural Networks for 3-D Temporal Medical Imaging Data

Rakowski, Alexander G., Veličković, Petar, Dall'Ara, Enrico, Liò, Pietro

arXiv.org Machine Learning

ChronoMID builds on the success of cross-modal convolutional neural networks (X-CNNs), making the novel application of the technique to medical imaging data. Specifically, this paper presents and compares alternative approaches - timestamps and difference images - to incorporate temporal information for the classification of bone disease in mice, applied to micro-CT scans of mouse tibiae. Whilst much previous work on diseases and disease classification has been based on mathematical models incorporating domain expertise and the explicit encoding of assumptions, the approaches given here utilise the growing availability of computing resources to analyse large datasets and uncover subtle patterns in both space and time. After training on a balanced set of over 75000 images, all models incorporating temporal features outperformed a state-of-the-art CNN baseline on an unseen, balanced validation set comprising over 20000 images. The top-performing model achieved 99.54% accuracy, compared to 73.02% for the CNN baseline.


Deep Learning for Audio Transcription on Low-Resource Datasets

Morfi, Veronica, Stowell, Dan

arXiv.org Machine Learning

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.